General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers

نویسندگان

  • Robert Serfling
  • Shanshan Wang
چکیده

With greatly advanced computational resources, the scope of statistical data analysis and modeling has widened to accommodate pressing new arenas of application. In all such data settings, an important and challenging task is the identification of outliers. Especially, an outlier identification procedure must be robust against the possibilities of masking (an outlier is undetected as such) and swamping (a nonoutlier is classified as an outlier). Here we provide general foundations and criteria for quantifying the robustness of outlier detection procedures against masking and swamping. This unifies a scattering of existing results confined to univariate or multivariate data, and extends to a completely general framework allowing any type of data. For any space X of objects and probability model F on X , we consider a real-valued outlyingness function O(x, F ) defined over x in X and a sample version O(x,Xn) based on a sample Xn from X . In this setting, and within a coherent framework, we formulate general definitions of masking breakdown point and swamping breakdown point and develop lemmas for evaluating these robustness measures in practical applications. A brief illustration of the technique of application of the lemmas is provided for univariate scaled deviation outlyingness. AMS 2000 Subject Classification: Primary 62G35 Secondary 62-07

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Masking and Swamping Robustness of Leading Outlier Identifiers for Univariate Data

In the wide-ranging scope of modern statistical data analysis, a key task is identification of outliers. In using an outlier identification procedure, one needs to know its robustness against masking (an “outlier” is undetected) and swamping (a “nonoutlier” is classified as an “outlier”), possibilities which can come about due to the presence of outliers. Study of these issues together is neces...

متن کامل

Nonparametric Depth-Based Multivariate Outlier Identifiers, and Robustness Properties

In extending univariate outlier detection methods to higher dimension, various special issues arise, such as limitations of visualization methods, inadequacy of marginal methods, lack of a natural order, limited scope of parametric modeling, and restriction to ellipsoidal contours when using Mahalanobis distance methods. Here we pass beyond these limitations via an approach based on depth funct...

متن کامل

Nonparametric Depth-Based Multivariate Outlier Identifiers, and Masking Robustness Properties

In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on m...

متن کامل

A Control Chart Based on Cluster-Regression Adjustment for Retrospective Monitoring of Individual Characteristics

The tendency for experimental and industrial variables to include a certain proportion of outliers has become a rule rather than an exception. These clusters of outliers, if left undetected, have the capability to distort the mean and the covariance matrix of the Hotelling's T2 multivariate control charts constructed to monitor individual quality characteristics. The effect of this distortion i...

متن کامل

Detection of Outlier Patches in Autoregressive Time Series

This paper proposes a procedure to detect patches of outliers in an autoregressive process. The procedure is an improvement over the existing detection methods via Gibbs sampling. We show that the standard outlier detection via Gibbs sampling may be extremely ine cient in the presence of severe masking and swamping e ects. The new procedure identi es the beginning and end of possible outlier pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012